From Peptidome to PRIDE: Public proteomics data migration at a large scale
نویسندگان
چکیده
The PRIDE database, developed and maintained at the European Bioinformatics Institute (EBI), is one of the most prominent data repositories dedicated to high throughput MS-based proteomics data. Peptidome, developed by the National Center for Biotechnology Information (NCBI) as a sibling resource to PRIDE, was discontinued due to funding constraints in April 2011. A joint effort between the two teams was started soon after the Peptidome closure to ensure that data were not "lost" to the wider proteomics community by exporting it to PRIDE. As a result, data in the low terabyte range have been migrated from Peptidome to PRIDE and made publicly available under experiment accessions 17 900-18 271, representing 54 projects, ~53 million mass spectra, ~10 million peptide identifications, ~650,000 protein identifications, ~1.1 million biologically relevant protein modifications, and 28 species, from more than 30 different labs.
منابع مشابه
Resilience in the proteomics data ecosystem: how the field cares for its data.
The public dissemination of data is an integral part of the life sciences. In the field of proteomics too, data sharing has taken off over the last few years, with the first downstream uses of these data quickly gaining prominence. At the same time, the recent unfortunate demise of two repositories, NCBI Peptidome and ProteomeCommons Tranche, has shown the frailty of such data gathering efforts...
متن کاملNCBI Peptidome: a new repository for mass spectrometry proteomics data
Peptidome is a public repository that archives and freely distributes tandem mass spectrometry peptide and protein identification data generated by the scientific community. Data from all stages of a mass spectrometry experiment are captured, including original mass spectra files, experimental metadata and conclusion-level results. The submission process is facilitated through acceptance of dat...
متن کاملPRIDE: Quality control in a proteomics data repository
The PRoteomics IDEntifications (PRIDE) database is a large public proteomics data repository, containing over 270 million mass spectra (by November 2011). PRIDE is an archival database, providing the proteomics data supporting specific scientific publications in a computationally accessible manner. While PRIDE faces rapid increases in data deposition size as well as number of depositions, the m...
متن کاملPRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets*
The original PRIDE Inspector tool was developed as an open source standalone tool to enable the visualization and validation of mass-spectrometry (MS)-based proteomics data before data submission or already publicly available in the Proteomics Identifications (PRIDE) database. The initial implementation of the tool focused on visualizing PRIDE data by supporting the PRIDE XML format and a direc...
متن کاملPRIDE: Quick tour
PRIDE is a core member in the ProteomeXchange (PX) consortium [6], which provides a standard framework for the submission and dissemination of mass spectrometry (MS)-based proteomics data to public-domain repositories. PRIDE acts as the initial submission point of MS/MS data. Datasets are then submitted to ProteomeXchange via PRIDE and are handled by expert biocurators. PRIDE also integrates th...
متن کامل